327 research outputs found

    Ten simple rules for a community computational challenge.

    Get PDF
    In science, the relationship between methods and discovery is symbiotic. As we discover more, we are able to construct more precise and sensitive tools and methods that enable further dis-covery. With better lens crafting came microscopes, and with them the discovery of living cells. In the last 40 years, advances in molecular biology, statistics, and computer science have ush-ered in the field of bioinformatics and the genomic era. Computational scientists enjoy developing new methods, and the community encourages them to do so. Indeed, the editorial guidelines for PLOS Computational Biology require manu-scripts to apply novel methods. However, it is often confusing to know which method to choose: which method is best? And, in this context, what does ā€œbestā€mean? To help choose an appropriate method for a particular task, scientists often form communi-ty-based challenges for the unbiased evaluation of methods in a given field. These challenges help evaluate existing and novel methods, while helping to coalesce a community and leading to new ideas and collaborations. In computational biology, the first of these challenges was arguably the Critical Assess-ment of protein Structure Prediction, or CASP [1], whose goal is to evaluate methods for pre

    Classification in biological networks with hypergraphlet kernels

    Get PDF
    MOTIVATION: Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins and drugs) and edges represent relational ties between these objects (binds-to, interacts-with and regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, suffer from information loss when modeling physical systems due to their inability to accurately represent multiobject relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. RESULTS: We present a hypergraph-based approach for modeling biological systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs. We then introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of hypergraphlets; i.e. small hypergraphs rooted at a vertex of interest. We empirically evaluate this method on fifteen biological networks and show its potential use in a positive-unlabeled setting to estimate the interactome sizes in various species. AVAILABILITY AND IMPLEMENTATION: https://github.com/jlugomar/hypergraphlet-kernels. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Influence of Sequence Changes and Environment on Intrinsically Disordered Proteins

    Get PDF
    Many large-scale studies on intrinsically disordered proteins are implicitly based on the structural models deposited in the Protein Data Bank. Yet, the static nature of deposited models supplies little insight into variation of protein structure and function under diverse cellular and environmental conditions. While the computational predictability of disordered regions provides practical evidence that disorder is an intrinsic property of proteins, the robustness of disordered regions to changes in sequence or environmental conditions has not been systematically studied. We analyzed intrinsically disordered regions in the same or similar proteins crystallized independently and studied their sensitivity to changes in protein sequence and parameters of crystallographic experiments. The observed changes in the existence, position, and length of disordered regions indicate that their appearance in X-ray structures dramatically depends on changes in amino acid sequence and peculiarities of the crystallographic experiment. Our study also raises general questions regarding protein evolution and the regulation of protein structure, dynamics, and function via variations in cellular and environmental conditions

    Global human frequencies of predicted nuclear pathogenic variants and the role played by protein hydrophobicity in pathogenicity potential

    Get PDF
    Mitochondrial proteins are coded by nuclear (nDNA) and mitochondrial (mtDNA) genes, implying a complex cross-talk between the two genomes. Here we investigated the diversity displayed in 104 nuclear-coded mitochondrial proteins from 1,092 individuals from the 1000 Genomes dataset, in order to evaluate if these genes are under the effects of purifying selection and how that selection compares with their mitochondrial encoded counterparts. Only the very rare variants (frequency < 0.1%) in these nDNA genes are indistinguishable from a random set from all possible variants in terms of predicted pathogenicity score, but more frequent variants display distinct signs of purifying selection. Comparisons of selection strength indicate stronger selection in the mtDNA genes compared to this set of nDNA genes, accounted for by the high hydrophobicity of the proteins coded by the mtDNA. Most of the predicted pathogenic variants in the nDNA genes were restricted to a single continental population. The proportion of individuals having at least one potential pathogenic mutation in this gene set was significantly lower in Europeans than in Africans and Asians. This difference may reflect demographic asymmetries, since African and Asian populations experienced main expansions in middle Holocene, while in Europeans the main expansions occurred earlier in the post-glacial period

    Potential functions of LEA proteins from the brine shrimp Artemia franciscana - Anhydrobiosis meets bioinformatics.

    Get PDF
    Late embryogenesis abundant (LEA) proteins are a large group of anhydrobiosis-associated intrinsically disordered proteins (IDP), which are commonly found in plants and some animals. The brine shrimp Artemiafranciscana is the only known animal that expresses LEA proteins from three, and not only one, different groups in its anhydrobiotic life stage. The reason for the higher complexity in the A. franciscana LEA proteome (LEAome), compared with other anhydrobiotic animals, remains mostly unknown. To address this issue, we have employed a suite of bioinformatics tools to evaluate the disorder status of the ArtemiaLEAome and to analyze the roles of intrinsic disorder in functioning of brine shrimp LEA proteins. We show here that A. franciscanaLEA proteins from different groups are more similar to each other than one originally expected, while functional differences among members of group 3 are possibly larger than commonly anticipated. Our data show that although these proteins are characterized by a large variety of forms and possible functions, as a general strategy, A. franciscana utilizes glassy matrix forming LEAs concurrently with proteins that more readily interact with binding partners. It is likely that the function(s) of both types, the matrix-forming and partner-binding LEA proteins, are regulated by changing water availability during desiccation

    PINTA: a web server for network-based gene prioritization from expression data

    Get PDF
    PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide proteinā€“protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user

    Analysis of AML genes in dysregulated molecular networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples.</p> <p>Results</p> <p>Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation.</p> <p>Conclusion</p> <p>We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to the minor changes in mRNA level.</p

    MutDB: update on development of tools for the biochemical analysis of genetic variation

    Get PDF
    Understanding how genetic variation affects the molecular function of gene products is an emergent area of bioinformatic research. Here, we present updates to MutDB (http://www.mutdb.org), a tool aiming to aid bioinformatic studies by integrating publicly available databases of human genetic variation with molecular features and clinical phenotype data. MutDB, first developed in 2002, integrates annotated SNPs in dbSNP and amino acid substitutions in Swiss-Prot with protein structural information, links to scores that predict functional disruption and other useful annotations. Though these functional annotations are mainly focused on nonsynonymous SNPs, some information on other SNP types included in dbSNP is also provided. Additionally, we have developed a new functionality that facilitates KEGG pathway visualization of genes containing SNPs and a SNP query tool for visualizing and exporting sets of SNPs that share selected features based on certain filters

    The Rewiring of Ubiquitination Targets in a Pathogenic Yeast Promotes Metabolic Flexibility, Host Colonization and Virulence

    Get PDF
    Funding: This work was funded by the European Research Council [http://erc.europa.eu/], AJPB (STRIFE Advanced Grant; C-2009-AdG-249793). The work was also supported by: the Wellcome Trust [www.wellcome.ac.uk], AJPB (080088, 097377); the UK Biotechnology and Biological Research Council [www.bbsrc.ac.uk], AJPB (BB/F00513X/1, BB/K017365/1); the CNPq-Brazil [http://cnpq.br], GMA (Science without Borders fellowship 202976/2014-9); and the National Centre for the Replacement, Refinement and Reduction of Animals in Research [www.nc3rs.org.uk], DMM (NC/K000306/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Acknowledgments We thank Dr. Elizabeth Johnson (Mycology Reference Laboratory, Bristol) for providing strains, and the Aberdeen Proteomics facility for the biotyping of S. cerevisiae clinical isolates, and to Euroscarf for providing S. cerevisiae strains and plasmids. We are grateful to our Microscopy Facility in the Institute of Medical Sciences for their expert help with the electron microscopy, and to our friends in the Aberdeen Fungal Group for insightful discussions.Peer reviewedPublisher PD

    Exploring and Exploiting Disease Interactions from Multi-Relational Gene and Phenotype Networks

    Get PDF
    The availability of electronic health care records is unlocking the potential for novel studies on understanding and modeling disease co-morbidities based on both phenotypic and genetic data. Moreover, the insurgence of increasingly reliable phenotypic data can aid further studies on investigating the potential genetic links among diseases. The goal is to create a feedback loop where computational tools guide and facilitate research, leading to improved biological knowledge and clinical standards, which in turn should generate better data. We build and analyze disease interaction networks based on data collected from previous genetic association studies and patient medical histories, spanning over 12 years, acquired from a regional hospital. By exploring both individual and combined interactions among these two levels of disease data, we provide novel insight into the interplay between genetics and clinical realities. Our results show a marked difference between the well defined structure of genetic relationships and the chaotic co-morbidity network, but also highlight clear interdependencies. We demonstrate the power of these dependencies by proposing a novel multi-relational link prediction method, showing that disease co-morbidity can enhance our currently limited knowledge of genetic association. Furthermore, our methods for integrated networks of diverse data are widely applicable and can provide novel advances for many problems in systems biology and personalized medicine
    • ā€¦
    corecore